Privacy-preserving training of tree ensembles over continuous data
نویسندگان
چکیده
Abstract Most existing Secure Multi-Party Computation (MPC) protocols for privacy-preserving training of decision trees over distributed data assume that the features are categorical. In real-life applications, often numerical. The standard “in clear” algorithm to grow on with continuous values requires sorting examples each feature in quest an optimal cut-point range node. Sorting is expensive operation MPC, hence finding secure avoid such step a relevant problem machine learning. this paper we propose three more efficient alternatives tree based models features, namely: (1) discretization data, followed by discretized data; (2) random forest and (3) extremely randomized (“extra-trees”) original data. Approaches both involve randomizing choices. addition, approach cut-points chosen randomly as well, thereby alleviating need sort or discretize up front. We implemented all proposed solutions semi-honest setting additive secret sharing MPC. addition mathematically proving approaches correct secure, experimentally evaluated compared them terms classification accuracy runtime. privately train ensembles sets thousands instances few minutes, accuracies at par those obtained clear. This makes our solution than approaches, which oblivious sorting.
منابع مشابه
Privacy-Preserving Decision Tree Classification Over Horizontally Partitioned Data
Protection of privacy is one of important problems in data mining. The unwillingness to share their data frequently results in failure of collaborative data mining. This paper studies how to build a decision tree classifier under the following scenario: a database is horizontally partitioned into multiple pieces, with each piece owned by a particular party. All the parties want to build a decis...
متن کاملPrivacy preserving decision tree learning over multiple parties
Datamining overmultiple data sources has emerged as an important practical problemwith applications in different areas such as data streams, data-warehouses, and bioinformatics. Although the data sources are willing to run data mining algorithms in these cases, they do not want to reveal any extra information about their data to other sources due to legal or competition concerns. One possible s...
متن کاملSecure and Privacy Preserving Outsourcing of Tree Structured Data
With the increasing use of web services, many new challenges concerning data security are becoming critical. Data or applications can now be outsourced to powerful remote servers, which are able to provide services on behalf of the owners. Unfortunately, such hosts may not always be trustworthy. In [1, 2], we presented a one-server computationally private tree traversal technique, which allows ...
متن کاملPrivacy Preserving Data Mining over Vertically Partitioned Data
Vaidya, Jaideep Shrikant. Ph.D., Purdue University, August, 2004. Privacy Preserving Data Mining over Vertically Partitioned Data. Major Professor: Chris Clifton. The goal of data mining is to extract or “mine” knowledge from large amounts of data. However, data is often collected by several different sites. Privacy, legal and commercial concerns restrict centralized access to this data. Theore...
متن کاملPreserving Privacy of Continuous High-dimensional Data with Minimax Filters
Preserving privacy of high-dimensional and continuous data such as images or biometric data is a challenging problem. This paper formulates this problem as a learning game between three parties: 1) data contributors using a filter to sanitize data samples, 2) a cooperative data aggregator learning a target task using the filtered samples, and 3) an adversary learning to identify contributors us...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings on Privacy Enhancing Technologies
سال: 2022
ISSN: ['2299-0984']
DOI: https://doi.org/10.2478/popets-2022-0042